175 research outputs found

    Multiclass discovery in array data

    Get PDF
    BACKGROUND: A routine goal in the analysis of microarray data is to identify genes with expression levels that correlate with known classes of experiments. In a growing number of array data sets, it has been shown that there is an over-abundance of genes that discriminate between known classes as compared to expectations for random classes. Therefore, one can search for novel classes in array data by looking for partitions of experiments for which there are an over-abundance of discriminatory genes. We have previously used such an approach in a breast cancer study. RESULTS: We describe the implementation of an unsupervised classification method for class discovery in microarray data. The method allows for discovery of more than two classes. We applied our method on two published microarray data sets: small round blue cell tumors and breast tumors. The method predicts relevant classes in the data sets with high success rates. CONCLUSIONS: We conclude that the proposed method is accurate and efficient in finding biologically relevant classes in microarray data. Additionally, the method is useful for quality control of microarray experiments. We have made the method available as a computer program

    Revealing signaling pathway deregulation by using gene expression signatures and regulatory motif analysis

    Get PDF
    A strategy for identifying cell signaling pathways whose deregulation result in an observed expression signature is presented

    The Landscape of Candidate Driver Genes Differs between Male and Female Breast Cancer.

    Get PDF
    The rapidly growing collection of diverse genome-scale data from multiple tumor types sheds light on various aspects of the underlying tumor biology. With the objective to identify genes of importance for breast tumorigenesis in men and to enable comparisons with genes important for breast cancer development in women, we applied the computational framework COpy Number and EXpression In Cancer (CONEXIC) to detect candidate driver genes among all altered passenger genes. Unique to this approach is that each driver gene is associated with several gene modules that are believed to be altered by the driver. Thirty candidate drivers were found in the male breast cancers and 67 in the female breast cancers. We identified many known drivers of breast cancer and other types of cancer, in the female dataset (e.g. GATA3, CCNE1, GRB7, CDK4). In contrast, only three known cancer genes were found among male breast cancers; MAP2K4, LHP, and ZNF217. Many of the candidate drivers identified are known to be involved in processes associated with tumorigenesis, including proliferation, invasion and differentiation. One of the modules identified in male breast cancer was regulated by THY1, a gene involved in invasion and related to epithelial-mesenchymal transition. Furthermore, men with THY1 positive breast cancers had significantly inferior survival. THY1 may thus be a promising novel prognostic marker for male breast cancer. Another module identified among male breast cancers, regulated by SPAG5, was closely associated with proliferation. Our data indicate that male and female breast cancers display highly different landscapes of candidate driver genes, as only a few genes were found in common between the two. Consequently, the pathobiology of male breast cancer may differ from that of female breast cancer and can be associated with differences in prognosis; men diagnosed with breast cancer may consequently require different management and treatment strategies than women

    Folding Free Energies of 5′-UTRs Impact Post-Transcriptional Regulation on a Genomic Scale in Yeast

    Get PDF
    Using high-throughput technologies, abundances and other features of genes and proteins have been measured on a genome-wide scale in Saccharomyces cerevisiae. In contrast, secondary structure in 5′–untranslated regions (UTRs) of mRNA has only been investigated for a limited number of genes. Here, the aim is to study genome-wide regulatory effects of mRNA 5′-UTR folding free energies. We performed computations of secondary structures in 5′-UTRs and their folding free energies for all verified genes in S. cerevisiae. We found significant correlations between folding free energies of 5′-UTRs and various transcript features measured in genome-wide studies of yeast. In particular, mRNAs with weakly folded 5′-UTRs have higher translation rates, higher abundances of the corresponding proteins, longer half-lives, and higher numbers of transcripts, and are upregulated after heat shock. Furthermore, 5′-UTRs have significantly higher folding free energies than other genomic regions and randomized sequences. We also found a positive correlation between transcript half-life and ribosome occupancy that is more pronounced for short-lived transcripts, which supports a picture of competition between translation and degradation. Among the genes with strongly folded 5′-UTRs, there is a huge overrepresentation of uncharacterized open reading frames. Based on our analysis, we conclude that (i) there is a widespread bias for 5′-UTRs to be weakly folded, (ii) folding free energies of 5′-UTRs are correlated with mRNA translation and turnover on a genomic scale, and (iii) transcripts with strongly folded 5′-UTRs are often rare and hard to find experimentally

    Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures.</p> <p>Results</p> <p>We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 10<sup>5 </sup>gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes.</p> <p>Conclusions</p> <p>The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.</p

    An investigation of screwiness in hadronic final states from DELPHI

    Get PDF
    A recent theoretical model by Andersson et al. proposes that soft gluons order themselves in the form of a helix at the end of the QCD cascades. The Authors of the model present a measure of the rapidity-azimuthal angle correlation, which they call screwiness. We searched for such a signal in DELPHI data, and found no evidence for screwiness

    Normalization of array-CGH data: influence of copy number imbalances

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-resolution microarray-based comparative genomic hybridization (CGH) techniques have successfully been applied to study copy number imbalances in a number of settings such as the analysis of cancer genomes. For normalization of array-CGH data, methods initially developed for gene expression microarray analysis have, in general, been directly adopted and used. However, these methods are designed to work under assumptions that may not be valid for array-CGH data when copy number imbalances are present. We therefore sought to investigate the effect on normalization imposed by copy number imbalances.</p> <p>Results</p> <p>Here we demonstrate that copy number imbalances correlate with intensity in array-CGH data thereby causing problems for conventional normalization methods. We propose a strategy to circumvent these problems by taking copy number imbalances into account during normalization, and we test the proposed strategy using several data sets from the analysis of cancer genomes. In addition, we show how the strategy can be applied to conveniently define adaptive sample-specific boundaries between balanced copy number, losses, and gains to facilitate management of variation in tissue heterogeneity when calling copy number changes.</p> <p>Conclusion</p> <p>We highlight the importance of considering copy number imbalances during normalization of array-CGH data, and show how failure to do so can deleteriously affect data and hamper interpretation.</p

    The gene expression landscape of breast cancer is shaped by tumor protein p53 status and epithelial-mesenchymal transition

    Get PDF
    Introduction: Gene expression data derived from clinical cancer specimens provide an opportunity to characterize cancer-specific transcriptional programs. Here, we present an analysis delineating a correlation-based gene expression landscape of breast cancer that identifies modules with strong associations to breast cancer-specific and general tumor biology. Methods: Modules of highly connected genes were extracted from a gene co-expression network that was constructed based on Pearson correlation, and module activities were then calculated using a pathway activity score. Functional annotations of modules were experimentally validated with an siRNA cell spot microarray system using the KPL-4 breast cancer cell line, and by using gene expression data from functional studies. Modules were derived using gene expression data representing 1,608 breast cancer samples and validated in data sets representing 971 independent breast cancer samples as well as 1,231 samples from other cancer forms. Results: The initial co-expression network analysis resulted in the characterization of eight tightly regulated gene modules. Cell cycle genes were divided into two transcriptional programs, and experimental validation using an siRNA screen showed different functional roles for these programs during proliferation. The division of the two programs was found to act as a marker for tumor protein p53 (TP53) gene status in luminal breast cancer, with the two programs being separated only in luminal tumors with functional p53 (encoded by TP53). Moreover, a module containing fibroblast and stroma-related genes was highly expressed in fibroblasts, but was also up-regulated by overexpression of epithelial-mesenchymal transition factors such as transforming growth factor beta 1 (TGF-beta1) and Snail in immortalized human mammary epithelial cells. Strikingly, the stroma transcriptional program related to less malignant tumors for luminal disease and aggressive lymph node positive disease among basal-like tumors. Conclusions: We have derived a robust gene expression landscape of breast cancer that reflects known subtypes as well as heterogeneity within these subtypes. By applying the modules to TP53-mutated samples we shed light on the biological consequences of non-functional p53 in otherwise low-proliferating luminal breast cancer. Furthermore, as in the case of the stroma module, we show that the biological and clinical interpretation of a set of co-regulated genes is subtype-dependent

    A Strategy For Identifying Putative Causes Of Gene Expression Variation In Human Cancer

    Get PDF
    There is often a need to predict the impact of alterations in one variable on another variable. This is especially the case in cancer research, where much effort has been made to carry out large-scale gene expression screening by microarray techniques. However, the causes of this variability from one cancer to another and from one gene to another often remain unknown. In this study we present a systematic procedure for finding genes whose expression is altered by an intrinsic or extrinsic explanatory phenomenon. The procedure has three stages: preprocessing, data integration and statistical analysis. We tested and verified the utility of this approach in a study, where expression and copy number of 13,824 genes were determined in 14 breast cancer samples. The expression of 270 genes could be explained by the variability of gene copy number. These genes may represent an important set of primary, genetically &quot;damaged&quot; genes that drive cancer progression
    corecore